Grammar Acquisition Based on Clustering Analysis and Its Application to Statistical Parsing
نویسندگان
چکیده
This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on their local contextual information, the corpus is automatically labeled with some nonterminal labels, and consequently a grammar with conditional probabilities is acquired. The statistical parsing model provides a framework for nding the most likely parse of a sentence based on these conditional probabilities. Experiments using Wall Street Journal data show that our approach achieves a relatively high accuracy: 88 % recall, 72 % precision and 0.7 crossing brackets per sentence for sentences shorter than 10 words, and 71 % recall, 51 % precision and 3.4 crossing brackets for sentences between 10-19 words. This result supports the assumption that local contextual statistics obtained from an unlabeled bracketed corpus are e ective for learning a useful grammar and parsing.
منابع مشابه
KEY WORDS-Statistical Parsing, Grammar Acquisition, Clustering Analysis, Local Contextual
This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on ...
متن کاملGrammar Acquisition and Statistical Parsing by exploiting Local Contextual Information
This paper presents a method for inducing a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus using local contextual information and describes a natural language parsing model which uses a probabilitybased scoring function of the grammar to rank parses of a sentence. This method uses clustering techniques to group brackets in a corpus into a numbe...
متن کاملStatistical Parsing with a Grammar Acquired from a Bracketed Corpus Based on Clustering Analysis
This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on ...
متن کاملStatistical Parsing with a Grammar Acquired from a Bracketed Corpus Based on Clustering Analysis
This work proposes a new method for learning a contextsensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis, and introduces a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. The method is superior to previous works (i.e., [ Collins, 1996 ] ) in the followi...
متن کاملLearning for Semantic Parsing Using Statistical Machine Translation Techniques
Semantic parsing is the construction of a complete, formal, symbolic meaning representation of a sentence. While it is crucial to natural language understanding, the problem of semantic parsing has received relatively little attention from the machine learning community. Recent work on natural language understanding has mainly focused on shallow semantic analysis, such as word-sense disambiguat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997